Search CORE

2,831 research outputs found

Universal Reinforcement Learning Algorithms: Survey and Experiments

Author: Aslanides John
Hutter Marcus
Leike Jan
Publication venue
Publication date: 30/05/2017
Field of study

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

arXiv.org e-Print Archive

Crossref

The New Jersey Gross Income Tax Act

Author: Aslanides Peter C.
Brescher John B., Jr.
Publication venue: eRepository @ Seton Hall
Publication date: 03/04/2023
Field of study

Seton Hall University Libraries

Correlation between epithelial thickness in normal corneas, untreated ectatic corneas, and ectatic corneas previously treated with CXL; is overall epithelial thickness a very early ectasia prognostic factor?

Author: Asimellis George
Aslanides Ioannis M
Kanellopoulos Anastasios John
Publication venue: Dove Medical Press
Publication date
Field of study

Vide

Crossref

PubMed Central

Fine-Tuning Language Models via Epistemic Neural Networks

Author: Asghari Seyed Mohammad
Aslanides John
Irving Geoffrey
McAleese Nat
Osband Ian
Van Roy Benjamin
Publication venue
Publication date: 02/11/2022
Field of study

Large language models are now part of a powerful new paradigm in machine learning. These models learn a wide range of capabilities from training on large unsupervised text corpora. In many applications, these capabilities are then fine-tuned through additional training on specialized data to improve performance in that setting. In this paper, we augment these models with an epinet: a small additional network architecture that helps to estimate model uncertainty and form an epistemic neural network (ENN). ENNs are neural networks that can know what they don't know. We show that, using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes

arXiv.org e-Print Archive